Efficient Search in Document Image Collections
نویسندگان
چکیده
This paper presents an efficient indexing and retrieval scheme for searching in document image databases. In many non-European languages, optical character recognizers are not very accurate. Word spotting word image matching may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic timewarping and/or nearest neighbor search, tend to be slow. Here indexing is done using locality sensitive hashing (LSH) a technique which computes multiple hashes using word image features computed at word level. Efficiency and scalability is achieved by content-sensitive hashing implemented through approximate nearest neighbor computation. We demonstrate that the technique achieves high precision and recall (in the 90% range), using a large image corpus consisting of seven Kalidasa’s (a well known Indian poet of antiquity) books in the Telugu language. The accuracy is comparable to using dynamic time warping and nearest neighbor search while the speed is orders of magnitude better 20000 word images can be searched in milliseconds.
منابع مشابه
Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملمرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشهبندی
With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...
متن کاملRetrieval from Document Image Collections
This paper presents a system for retrieval of relevant documents from large document image collections. We achieve effective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTWbased partial matching scheme is employed to take care of mo...
متن کاملSupporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation
The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a pr...
متن کاملContent-based document image retrieval in complex document collections
We address the problem of content-based image retrieval in the context of complex document images. Complex document are documents that typically start out on paper and are then electronically scanned. These documents have rich internal structure and might only be available in image form. Additionally, they may have been produced by a combination of printing technologies (or by handwriting); and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007